Assignment 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Jinheng

Published

October 7, 2025

Assignment Overview

Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences


Part 1: Healthcare Access for Vulnerable Populations

Research Question

Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?

Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.

Required Analysis Steps

Complete the following analysis, documenting each step with code and brief explanations:

Step 1: Data Collection (5 points)

Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts

Your Task:

# Load required packages
library(sf)
library(tidyverse)
library(tigris)
library(tidycensus)
library(scales)
library(patchwork)
library(here)
library(knitr)

# Load spatial data
pa_counties <- st_read("data/Pennsylvania_County_Boundaries.shp")
Reading layer `Pennsylvania_County_Boundaries' from data source 
  `/Users/jinhengcen/Documents/portfolio-setup-CenJinHeng/assignments/assignment_2/data/Pennsylvania_County_Boundaries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator
hospitals <- st_read("data/hospitals.geojson")
Reading layer `hospitals' from data source 
  `/Users/jinhengcen/Documents/portfolio-setup-CenJinHeng/assignments/assignment_2/data/hospitals.geojson' 
  using driver `GeoJSON'
Simple feature collection with 223 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.49621 ymin: 39.75163 xmax: -74.86704 ymax: 42.13403
Geodetic CRS:  WGS 84
census_tracts <- tracts(state = "PA", cb = TRUE)

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
hospitals <- st_transform(hospitals, st_crs(pa_counties))
census_tracts <- st_transform(census_tracts, st_crs(pa_counties))

# Check that all data loaded correctly
pa_counties
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
   OBJECTID MSLINK COUNTY_NAM COUNTY_NUM FIPS_COUNT COUNTY_ARE COUNTY_PER
1       336     46 MONTGOMERY         46        091       <NA>       <NA>
2       337      8   BRADFORD         08        015       <NA>       <NA>
3       338      9      BUCKS         09        017       <NA>       <NA>
4       339     58      TIOGA         58        117       <NA>       <NA>
5       340     59      UNION         59        119       <NA>       <NA>
6       341     60    VENANGO         60        121       <NA>       <NA>
7       342     62 WASHINGTON         62        125       <NA>       <NA>
8       343     63      WAYNE         63        127       <NA>       <NA>
9       344     42     MCKEAN         42        083       <NA>       <NA>
10      345     43     MERCER         43        085       <NA>       <NA>
   NUMERIC_LA COUNTY_N_1 AREA_SQ_MI SOUND SPREAD_SHE IMAGE_NAME NOTE_FILE VIDEO
1           5         46   487.4271  <NA>       <NA>   poll.bmp      <NA>  <NA>
2           2          8  1161.3379  <NA>       <NA>   poll.bmp      <NA>  <NA>
3           5          9   622.0836  <NA>       <NA>   poll.bmp      <NA>  <NA>
4           2         58  1137.2480  <NA>       <NA>   poll.bmp      <NA>  <NA>
5           2         59   319.1893  <NA>       <NA>   poll.bmp      <NA>  <NA>
6           3         60   683.3676  <NA>       <NA>   poll.bmp      <NA>  <NA>
7           1         62   862.1077  <NA>       <NA>   poll.bmp      <NA>  <NA>
8           2         63   750.8286  <NA>       <NA>   poll.bmp      <NA>  <NA>
9           1         42   985.2700  <NA>       <NA>   poll.bmp      <NA>  <NA>
10          3         43   682.3598  <NA>       <NA>   poll.bmp      <NA>  <NA>
   DISTRICT_N PA_CTY_COD MAINT_CTY_ DISTRICT_O                       geometry
1          06         46          4        6-4 MULTIPOLYGON (((-8398884 48...
2          03         08          9        3-9 MULTIPOLYGON (((-8558633 51...
3          06         09          1        6-1 MULTIPOLYGON (((-8367360 49...
4          03         59          7        3-7 MULTIPOLYGON (((-8558633 51...
5          03         60          8        3-8 MULTIPOLYGON (((-8562865 49...
6          01         61          5        1-5 MULTIPOLYGON (((-8870781 50...
7          12         63          4       12-4 MULTIPOLYGON (((-8935296 48...
8          04         64          6        4-6 MULTIPOLYGON (((-8368003 50...
9          02         42          5        2-5 MULTIPOLYGON (((-8705967 51...
10         01         43          4        1-4 MULTIPOLYGON (((-8956031 50...
hospitals
Simple feature collection with 223 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -8960798 ymin: 4829916 xmax: -8334160 ymax: 5181077
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
                  CHIEF_EXEC              CHIEF_EX_1
1              Peter J Adamo               President
2           Autumn DeShields Chief Executive Officer
3               Shawn Parekh Chief Executive Officer
4                DIANE HRITZ Chief Executive Officer
5             Tim Harclerode Chief Executive Officer
6  Richard McLaughlin MD MBA Chief Executive Officer
7             Laura Murnyack             Interim CEO
8                  Adam Beck Chief Executive Officer
9                Pamela Keen Chief Executive Officer
10              Mark Papalia           President/CEO
                                 FACILITY_U LONGITUDE       COUNTY
1              https://www.phhealthcare.org -79.91131   Washington
2                 https://www.malvernbh.com -75.17005 Philadelphia
3            https://roxboroughmemorial.com -75.20963 Philadelphia
4                https://www.ashospital.net -80.27907   Washington
5                 https://www.conemaugh.org -79.02513     Somerset
6                   https://towerhealth.org -75.61213   Montgomery
7         https://bucktailmedicalcenter.org -77.73649      Clinton
8  https://www.selectspecialtyhospitals.com -76.88013      Dauphin
9          https://www.childrenshomepgh.org -79.93736    Allegheny
10    https://www.kanecommunityhospital.com -78.81705       McKean
                                                  FACILITY_N
1                                  Penn Highlands Mon Valley
2                                  MALVERN BEHAVIORAL HEALTH
3                               Roxborough Memorial Hospital
4                                 ADVANCED SURGICAL HOSPITAL
5                    DLP Conemaugh Meyersdale Medical Center
6                                    Pottstown Hospital, LLC
7                                    Bucktail medical Center
8  SELECT SPECIALTY HOSPITAL CENTRAL PENNSYLVANIA HARRISBURG
9                          The Children's Home of Pittsburgh
10              University of Pittsburgh Medical Center Kane
                           STREET   CITY_OR_BO LATITUDE   TELEPHONE_ ZIP_CODE
1          1163 Country Club Road  Monongahela 40.18193 724-258-1000    15063
2  1930 South Broad Street Unit 4 Philadelphia 39.92619 610-480-8919    19145
3               5800 Ridge Avenue Philadelphia 40.02869 215-483-9900    19128
4        100 TRICH DRIVE\nSUITE 1   WASHINGTON 40.15655   7248840710    15301
5              200 Hospital Drive   Meyersdale 39.80913 814-634-5911    15552
6           1600 East High Street    Pottstown 40.24273   6103277000    19464
7                1001 Pine Street       Renovo 41.32789 570-923-1000    17764
8          111 South Front Street   Harrisburg 40.25841 717-724-6604    17110
9                5324 Penn Avenue   Pittsburgh 40.46424 412-441-4884    15224
10                4372 US Route 6         Kane 41.67188   8148378585    16735
                   geometry
1  POINT (-8895686 4892415)
2  POINT (-8367892 4855223)
3  POINT (-8372298 4870112)
4  POINT (-8936626 4888717)
5  POINT (-8797037 4838244)
6  POINT (-8417104 4901278)
7  POINT (-8653586 5060826)
8  POINT (-8558256 4903566)
9  POINT (-8898587 4933636)
10 POINT (-8773874 5111954)
census_tracts
Simple feature collection with 3445 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963433 ymin: 4825308 xmax: -8314399 ymax: 5201489
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
   STATEFP COUNTYFP TRACTCE              GEOIDFQ       GEOID   NAME
1       42      001  031101 1400000US42001031101 42001031101 311.01
2       42      013  100400 1400000US42013100400 42013100400   1004
3       42      013  100500 1400000US42013100500 42013100500   1005
4       42      013  100800 1400000US42013100800 42013100800   1008
5       42      013  101900 1400000US42013101900 42013101900   1019
6       42      011  011200 1400000US42011011200 42011011200    112
7       42      011  000200 1400000US42011000200 42011000200      2
8       42      011  011500 1400000US42011011500 42011011500    115
9       42      011  000600 1400000US42011000600 42011000600      6
10      42      011  001900 1400000US42011001900 42011001900     19
              NAMELSAD STUSPS   NAMELSADCO   STATE_NAME LSAD   ALAND AWATER
1  Census Tract 311.01     PA Adams County Pennsylvania   CT 3043185      0
2    Census Tract 1004     PA Blair County Pennsylvania   CT  993724      0
3    Census Tract 1005     PA Blair County Pennsylvania   CT 1130204      0
4    Census Tract 1008     PA Blair County Pennsylvania   CT  996553      0
5    Census Tract 1019     PA Blair County Pennsylvania   CT  573726      0
6     Census Tract 112     PA Berks County Pennsylvania   CT 1539365   9308
7       Census Tract 2     PA Berks County Pennsylvania   CT 1949529 159015
8     Census Tract 115     PA Berks County Pennsylvania   CT 1978380  12469
9       Census Tract 6     PA Berks County Pennsylvania   CT 1460473      0
10     Census Tract 19     PA Berks County Pennsylvania   CT  182420      0
                         geometry
1  MULTIPOLYGON (((-8575060 48...
2  MULTIPOLYGON (((-8730207 49...
3  MULTIPOLYGON (((-8729297 49...
4  MULTIPOLYGON (((-8728636 49...
5  MULTIPOLYGON (((-8728378 49...
6  MULTIPOLYGON (((-8455198 49...
7  MULTIPOLYGON (((-8455907 49...
8  MULTIPOLYGON (((-8460184 49...
9  MULTIPOLYGON (((-8450851 49...
10 MULTIPOLYGON (((-8451174 49...

Questions to answer: - How many hospitals are in your dataset? 223 - How many census tracts? 3445 - What coordinate reference system is each dataset in? I converted them all in WGS 84 / Pseudo-Mercator —

Step 2: Get Demographic Data

Use tidycensus to download tract-level demographic data for Pennsylvania.

Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)

Your Task:

# Get demographic data from ACS
tract_data <- get_acs(
  geography = "tract",
  state     = "42",
  variables = c(
    total_population = "B01003_001",
    median_household_income = "B19013_001"
  ),
  year = 2023,
  survey = "acs5",
  output = "wide"
)

age_data <- get_acs(
  geography = "tract",
  state     = "42",
  variables = c(
    male_65 <- c("B01001_020","B01001_021","B01001_022","B01001_023","B01001_024","B01001_025"),
    female_65 <- c("B01001_044","B01001_045","B01001_046","B01001_047","B01001_048","B01001_049")
  ),
  year = 2023,
  survey = "acs5",
) %>% 
  group_by (GEOID) %>%
  summarise(pop_65 = sum(estimate, na.rm = TRUE))

age_data
# A tibble: 3,446 × 2
   GEOID       pop_65
   <chr>        <dbl>
 1 42001030101    577
 2 42001030103    570
 3 42001030104    858
 4 42001030200   1070
 5 42001030300    782
 6 42001030400   1231
 7 42001030500    788
 8 42001030600   1129
 9 42001030700   1216
10 42001030801    762
# ℹ 3,436 more rows
tract_data <- tract_data %>%
  left_join(age_data, by = "GEOID")

tract_data
# A tibble: 3,446 × 7
   GEOID       NAME   total_populationE total_populationM median_household_inc…¹
   <chr>       <chr>              <dbl>             <dbl>                  <dbl>
 1 42001030101 Censu…              2666                29                  82716
 2 42001030103 Censu…              2414               332                 111227
 3 42001030104 Censu…              3417               333                  66848
 4 42001030200 Censu…              5379               502                  72431
 5 42001030300 Censu…              4390               205                  84643
 6 42001030400 Censu…              5557               245                  97694
 7 42001030500 Censu…              3761               232                  69583
 8 42001030600 Censu…              4884                30                  85625
 9 42001030700 Censu…              6674               356                  80585
10 42001030801 Censu…              3357               320                  82500
# ℹ 3,436 more rows
# ℹ abbreviated name: ¹​median_household_incomeE
# ℹ 2 more variables: median_household_incomeM <dbl>, pop_65 <dbl>
tract_data %>%
  summarise(missing_data = sum(is.na(median_household_incomeE)))
# A tibble: 1 × 1
  missing_data
         <int>
1           66
tract_data %>%
  summarise(median_income = sum(median(median_household_incomeE, na.rm = TRUE)))
# A tibble: 1 × 1
  median_income
          <dbl>
1        72944.
# Join to tract boundaries

census_tracts_joined <- census_tracts %>%
  left_join(tract_data, by = "GEOID")

Questions to answer: - What year of ACS data are you using? 2023 - How many tracts have missing income data? 66 - What is the median income across all PA census tracts? 72943.5


Step 3: Define Vulnerable Populations

Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)

Your Task:

# Filter for vulnerable tracts based on your criteria

census_tracts_joined <- census_tracts_joined %>%
  mutate(
    percentage_65 = pop_65/total_populationE
  )

vulnerable_tracts <- census_tracts_joined %>%
  filter(median_household_incomeE < 48000,
         percentage_65 > 0.20
         )

Questions to answer: - What income threshold did you choose and why? For the income threshold, I chose 48,000 based on the 2025 poverty guidelines by the USCIS. For a family with four people, 150% of the federal poverty level is $48,225. - What elderly population threshold did you choose and why? For the age threshold, I chose the percentage of people over 65 years old is over 20%, which can be seen as an elderly community. - How many tracts meet your vulnerability criteria? 126 - What percentage of PA census tracts are considered vulnerable by your definition? the percentage of vulnerable tracts = 126 / 3445 = 3.66%


Step 4: Calculate Distance to Hospitals

For each vulnerable tract, calculate the distance to the nearest hospital.

Your Task:

# Transform to appropriate projected CRS

vulnerable_tracts_proj <- vulnerable_tracts %>%
  st_transform(5070) # 5070 is for the whole country. For south PA alone, we can use 3365, which is more precise.

hospitals_proj <- hospitals %>%
  st_transform(5070)


# Calculate distance from each tract centroid to nearest hospital

tract_centroids <- st_centroid(vulnerable_tracts_proj)

nearest_hospital_idx <- st_nearest_feature(tract_centroids, hospitals_proj)

dist_to_hospital <- st_distance(
  tract_centroids,
  hospitals_proj[nearest_hospital_idx, ],
  by_element = TRUE
)


vulnerable_tracts_proj <- vulnerable_tracts_proj %>%
  mutate(
    dist_to_hospital = as.numeric(dist_to_hospital),
    dist_to_hospital_mi = dist_to_hospital * 0.000621371
  )

distant_data <- vulnerable_tracts_proj %>%
  summarise(
    avg_distance = mean(dist_to_hospital_mi, na.rm = TRUE),
    max_dist = max(dist_to_hospital_mi, na.rm = TRUE),
  )

distant_data
Simple feature collection with 1 feature and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1282205 ymin: 1990587 xmax: 1757922 ymax: 2252774
Projected CRS: NAD83 / Conus Albers
  avg_distance max_dist                       geometry
1     2.803376 18.55538 MULTIPOLYGON (((1347636 204...
num_over15 <- vulnerable_tracts_proj %>%
  filter(dist_to_hospital_mi > 15) %>%
  summarise(
    n_over15 = n(),
  )

num_over15
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1461788 ymin: 2192308 xmax: 1464364 ymax: 2193760
Projected CRS: NAD83 / Conus Albers
  n_over15                       geometry
1        1 MULTIPOLYGON (((1461853 219...

Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection

Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts? 2.8 miles - What is the maximum distance? 18.6 miles - How many vulnerable tracts are more than 15 miles from the nearest hospital? 1


Step 5: Identify Underserved Areas

Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.

Your Task:

# Create underserved variable

undeserved <- vulnerable_tracts_proj %>%
  filter(dist_to_hospital_mi > 15)

undeserved 
Simple feature collection with 1 feature and 22 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1461788 ymin: 2192308 xmax: 1464364 ymax: 2193760
Projected CRS: NAD83 / Conus Albers
  STATEFP COUNTYFP TRACTCE              GEOIDFQ       GEOID NAME.x
1      42      023  960100 1400000US42023960100 42023960100   9601
           NAMELSAD STUSPS     NAMELSADCO   STATE_NAME LSAD   ALAND AWATER
1 Census Tract 9601     PA Cameron County Pennsylvania   CT 1838256 107252
                                           NAME.y total_populationE
1 Census Tract 9601; Cameron County; Pennsylvania              2005
  total_populationM median_household_incomeE median_household_incomeM pop_65
1               134                    36167                     6116    504
                        geometry percentage_65 dist_to_hospital
1 MULTIPOLYGON (((1461853 219...     0.2513716            29862
  dist_to_hospital_mi
1            18.55538

Questions to answer: - How many tracts are underserved? only 1! - What percentage of vulnerable tracts are underserved? 1/126 = 0.79% - Does this surprise you? Why or why not? No, I looked at the distance data of the vulnerable tracts, most of them (92.9%) are under 10 miles (16000m). I think 15 miles is a very big number, so we just have one tract fulfill all the requirements.


Step 6: Aggregate to County Level

Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.

Your Task:

# Spatial join tracts to counties
vulnerable_tracts_proj <- st_transform(vulnerable_tracts_proj, st_crs(pa_counties))
undeserved  <- st_transform(undeserved , st_crs(pa_counties))

counties_num = census_tracts %>%
  st_centroid() %>%
  st_join(pa_counties%>% select(COUNTY_NAM)) %>%
  st_drop_geometry() %>%
  group_by(COUNTY_NAM) %>%
  summarize(
    n_tract =  n(),
  )%>%
  arrange(n_tract)

counties_vuln <- vulnerable_tracts_proj %>%
  st_centroid() %>%
  st_join(pa_counties%>% select(COUNTY_NAM)) %>%
  st_drop_geometry() %>%
  group_by(COUNTY_NAM) %>%
  summarize(
    n_vuln =  n(),
    avg_dist_vuln_mi = mean(dist_to_hospital_mi, na.rm = TRUE),
    total_vuln_pop = sum(pop_65, na.rm = TRUE)
  ) %>%
  arrange(desc(total_vuln_pop))

counties_unders <- undeserved %>%
  st_centroid() %>%
  st_join(pa_counties%>% select(COUNTY_NAM)) %>%
  st_drop_geometry() %>%
  group_by(COUNTY_NAM) %>%
  summarize(
    n_unders =  n(),
  ) %>%
  arrange(desc(n_unders))



# Aggregate statistics by county

counties_stat = counties_num %>%
  left_join(counties_vuln, by = "COUNTY_NAM") %>%
  left_join(counties_unders, by = "COUNTY_NAM") %>%
  mutate(
    percentage_unders = n_unders / n_vuln,
    percentage_vuln = n_vuln / n_tract
  )%>%
  arrange(desc(percentage_vuln))

Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population

Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? CAMERON is the highest, with 100% in the number. Since I only have 1 underserved tract, I would show top 5 counties with the highest percentage of vulnerable tracts: (1) CAMERON 50.0% (2) FAYETTE 19.4% (3) CAMBRIA 19.0% (4) JEFFERSON 15.4% (5) WARREN 15.4% - Which counties have the most vulnerable people living far from hospitals? PHILADELPHIA, with the number of 18022 - Are there any patterns in where underserved counties are located? Since I only have 1 underserved tract, I would observe patterns of vulnerable counties. (1) Most of them are from rural areas with less tracts number, so they have higher percentage of vulnerable tracts (2) They have large numbers of vulnerable population, which means lots of low-income and old people living there. Most of them have number over 1000.


Step 7: Create Summary Table

Create a professional table showing the top 10 priority counties for healthcare investment.

Your Task:

# Create and format priority counties table
# metrics for the priority: vulnerable index = normalized_number_of_vulnerable_tracts*0.3 + normalized_average_distant _hospital*0.5 + normalized_total_population*0.2

scale_minmax <- function(x) {
  rng <- range(x, na.rm = TRUE)
  if (!is.finite(diff(rng)) || diff(rng) == 0) return(rep(0.5, length(x)))
  (x - rng[1]) / diff(rng)
}

counties_stat_index <- counties_stat %>%
  mutate(
    n_vuln_norm = scale_minmax(n_vuln),
    avg_dist_vuln_mi_norm = scale_minmax(avg_dist_vuln_mi),
    total_vuln_pop_norm = scale_minmax(total_vuln_pop),
    percentage_vuln_norm = scale_minmax(percentage_vuln)
  ) %>%
   mutate(
     vuln_index =
       0.3  * n_vuln_norm +
       0.5 * avg_dist_vuln_mi_norm +
       0.2 * total_vuln_pop_norm
   ) %>%
  arrange(desc(vuln_index))

counties_stat_index
# A tibble: 68 × 13
   COUNTY_NAM     n_tract n_vuln avg_dist_vuln_mi total_vuln_pop n_unders
   <chr>            <int>  <int>            <dbl>          <dbl>    <int>
 1 ALLEGHENY          394     28            2.39           15213       NA
 2 CAMERON              2      1           18.6              504        1
 3 PHILADELPHIA       407     19            0.887          18022       NA
 4 HUNTINGDON          13      1           14.2              730       NA
 5 NORTHUMBERLAND      25      3           10.8             1603       NA
 6 CAMBRIA             42      8            4.15            4588       NA
 7 VENANGO             17      1            8.40             363       NA
 8 FAYETTE             36      7            2.55            5780       NA
 9 ERIE                72      7            2.35            5923       NA
10 WARREN              13      2            6.19            1257       NA
# ℹ 58 more rows
# ℹ 7 more variables: percentage_unders <dbl>, percentage_vuln <dbl>,
#   n_vuln_norm <dbl>, avg_dist_vuln_mi_norm <dbl>, total_vuln_pop_norm <dbl>,
#   percentage_vuln_norm <dbl>, vuln_index <dbl>
#make table

counties_table <- counties_stat_index %>%
  select(
    county_name = COUNTY_NAM,
    vulnerable_index = vuln_index,
    vulnerable_tract_number = n_vuln,
    average_dist_to_hospital = avg_dist_vuln_mi,
    total_vulnerable_pop = total_vuln_pop
  ) %>%
  mutate(
    vulnerable_index = round(vulnerable_index, 3),
    average_dist_to_hospital = round(average_dist_to_hospital, 2),
    total_vulnerable_pop = comma(total_vulnerable_pop)
  )%>%
  arrange(desc(vulnerable_index))

kable(
  counties_table,
  caption = "PA County-level Vulnerability Summary",
  align = "lcccc"
)
PA County-level Vulnerability Summary
county_name vulnerable_index vulnerable_tract_number average_dist_to_hospital total_vulnerable_pop
ALLEGHENY 0.519 28 2.39 15,213
CAMERON 0.502 1 18.56 504
PHILADELPHIA 0.409 19 0.89 18,022
HUNTINGDON 0.383 1 14.18 730
NORTHUMBERLAND 0.321 3 10.81 1,603
CAMBRIA 0.226 8 4.15 4,588
VENANGO 0.218 1 8.40 363
FAYETTE 0.183 7 2.55 5,780
ERIE 0.179 7 2.35 5,923
WARREN 0.178 2 6.19 1,257
LANCASTER 0.175 1 6.60 1,034
LUZERNE 0.153 6 2.51 4,195
WESTMORELAND 0.150 6 2.86 3,057
LAWRENCE 0.149 3 4.52 1,793
NORTHAMPTON 0.130 1 5.03 850
CLINTON 0.127 1 4.89 913
DELAWARE 0.120 6 1.32 4,179
LACKAWANNA 0.104 5 1.67 2,871
LEHIGH 0.092 1 3.42 1,417
BEAVER 0.076 2 2.53 1,291
BLAIR 0.076 2 2.18 2,079
INDIANA 0.070 1 2.63 1,447
WASHINGTON 0.068 2 2.39 915
LYCOMING 0.066 1 2.90 470
MERCER 0.063 3 1.42 1,866
JEFFERSON 0.057 2 1.72 1,566
MIFFLIN 0.038 1 1.87 516
YORK 0.028 1 1.55 425
CRAWFORD 0.015 1 0.93 787
WAYNE 0.014 1 0.83 951
SOMERSET 0.008 1 0.82 399
BUTLER 0.000 1 0.56 400
NA NA NA NA NA
FOREST NA NA NA NA
FULTON NA NA NA NA
MONTOUR NA NA NA NA
SULLIVAN NA NA NA NA
JUNIATA NA NA NA NA
POTTER NA NA NA NA
WYOMING NA NA NA NA
SNYDER NA NA NA NA
ELK NA NA NA NA
GREENE NA NA NA NA
PERRY NA NA NA NA
TIOGA NA NA NA NA
UNION NA NA NA NA
BEDFORD NA NA NA NA
MCKEAN NA NA NA NA
SUSQUEHANNA NA NA NA NA
CLARION NA NA NA NA
BRADFORD NA NA NA NA
COLUMBIA NA NA NA NA
CARBON NA NA NA NA
ARMSTRONG NA NA NA NA
CLEARFIELD NA NA NA NA
PIKE NA NA NA NA
ADAMS NA NA NA NA
FRANKLIN NA NA NA NA
LEBANON NA NA NA NA
CENTRE NA NA NA NA
SCHUYLKILL NA NA NA NA
MONROE NA NA NA NA
CUMBERLAND NA NA NA NA
DAUPHIN NA NA NA NA
BERKS NA NA NA NA
CHESTER NA NA NA NA
BUCKS NA NA NA NA
MONTGOMERY NA NA NA NA

Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)


Part 2: Comprehensive Visualization

Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.

Map 1: County-Level Choropleth

Create a choropleth map showing healthcare access challenges at the county level.

Your Task:

# Create county-level access map
# I use percentage of vulnerable tract here

library(ggplot2)

counties_stat_sf <- pa_counties %>%
  select(COUNTY_NAM, geometry) %>%
  left_join(counties_stat, by = "COUNTY_NAM")

p1 <- ggplot() +
  # county base map
  geom_sf(
    data = counties_stat_sf,
    aes(fill = percentage_vuln),
    color = "white", size = 0.2
  ) +
  # hospitals
  geom_sf(
    data = hospitals_proj,
    shape = 21, fill = "red",
    size = 1.8, alpha = 0.9
  ) +
  # legend
  scale_fill_viridis_c(
    name = "% vulnerable tracts",
    labels = label_percent(accuracy = 1),
    limits = c(0, 1),
    option = "C",      
    direction = -1
  ) +
  coord_sf() +
  theme_void(base_size = 12) +
  theme(
    legend.position = "right",
    legend.title = element_text(size = 10, face = "bold"),
    legend.text  = element_text(size = 9),
    plot.title   = element_text(face = "bold", size = 16),
    plot.subtitle= element_text(size = 12)
  ) +
  labs(
    title    = "Healthcare Access Challenges by County",
    subtitle = "share of vulnerable census tracts",
    caption  = "Source: analysis based on income and percentage of people over 65 years old."
  ) +
  guides(fill = guide_colorbar(
    barheight = unit(70, "pt"),
    barwidth  = unit(8,  "pt"),
    ticks.colour = "black"
  ))

p1

Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels


Map 2: Detailed Vulnerability Map

Create a map highlighting underserved vulnerable tracts.

Your Task:

# Create detailed tract-level map

p2 <- ggplot() +
  # tract base map
  geom_sf(
    data = census_tracts,
    color = "grey", size = 0.1
  ) +
  # county base map
  geom_sf(
    data = pa_counties,
    color = "lightblue", size = 0.5,fill = NA,
  ) +
  # hospitals
  geom_sf(
    data = hospitals_proj,
    shape = 16, color = "red",
    size = 1, alpha = 0.9
  ) +
  # underserved community
  geom_sf(
    data = undeserved,
    color = "blue"
  )+
  coord_sf() +
  theme_void(base_size = 12) +
  theme(
    plot.title   = element_text(face = "bold", size = 16),
    plot.subtitle= element_text(size = 12)
  ) +
  labs(
    title    = "Underserved Communities",
    subtitle = "Blue tracts represent underserved communities; red points show hospital locations."
  ) +
  guides(fill = guide_colorbar(
    barheight = unit(70, "pt"),
    barwidth  = unit(8,  "pt"),
    ticks.colour = "black"
  ))

p2

#only one very small blue tract in the center of the map

Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle


Chart: Distribution Analysis

Create a visualization showing the distribution of distances to hospitals for vulnerable populations.

Your Task:

# Create distribution visualization

# Histogram or density plot of distances
ggplot(vulnerable_tracts_proj) +
  aes(x = dist_to_hospital_mi) +
  geom_histogram(bins = 15, fill = "steelblue", alpha = 0.7) +
  labs(
    title = "Distribution of distance to the nearest hospital",
    x = "distance to the nearest hospital",
    y = "Number of vulnerable tracts",
    caption  = "Most of tracts have distance between 0 and 5."
  ) +
  theme_minimal()

# Box plot comparing distances across regions
ggplot(vulnerable_tracts_proj) +
  aes(x = NAMELSADCO, y = dist_to_hospital_mi, fill = NAMELSADCO) +
  geom_boxplot() +
  labs(
    title = "Distances by County Category",
    x = "County",
    y = "Distance",
    caption  = "While most counties show little variation, four counties have huge internal differences"
  ) +
  theme_minimal() +
  theme(legend.position = "none")+
  theme(
  axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1, size = 7)
)

# Bar chart of underserved tracts by county
#only 1 underserved tracts in my study, it is Census Tract 9601 in the Cameron County


# Scatter plot of distance vs. vulnerable population size
ggplot(vulnerable_tracts_proj) +
  aes(x = dist_to_hospital_mi, y = total_populationE) +
  geom_point(color = "steelblue", alpha = 0.6, size = 2) +
  geom_smooth(
    method = "lm",              
    se = TRUE,                  
    color = "darkred",          
    fill = "pink",              
    linewidth = 1
  ) +
  labs(
    title = "distance vs. vulnerable population size",
    x = "Distance to the nearest hospital",
    y = "Vulnerable population size",
    caption  = str_wrap("More vulnerable population, little distance to hospital. Since hospitals are designed to located near population clusters, and we didn't limit this effect in this study ", width = 90)
  ) +
  theme_minimal() +
  scale_x_continuous() +
  scale_y_continuous(labels = comma)

Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size

Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)


Part 3: Bring Your Own Data Analysis

Choose your own additional spatial dataset and conduct a supplementary analysis.

Challenge Options

Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).

Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question


Education & Youth Services

Option A: Educational Desert Analysis - Data: Schools, Libraries, Recreation Centers, Census tracts (child population) - Question: “Which neighborhoods lack adequate educational infrastructure for children?” - Operations: Buffer schools/libraries (0.5 mile walking distance), identify coverage gaps, overlay with child population density - Policy relevance: School district planning, library placement, after-school program siting

Option B: School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement


Environmental Justice

Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment —

Public Safety & Justice

Option D: Crime & Community Resources - Data: Crime Incidents, Recreation Centers, Libraries, Street Lights - Question: “Are high-crime areas underserved by community resources?” - Operations: Aggregate crime counts to census tracts or neighborhoods, count community resources per area, spatial correlation analysis - Policy relevance: Community investment, violence prevention strategies —

Infrastructure & Services

Option E: Polling Place Accessibility - Data: Polling Places, SEPTA stops, Census tracts (elderly population, disability rates) - Question: “Are polling places accessible for elderly and disabled voters?” - Operations: Buffer polling places and transit stops, identify vulnerable populations, find areas lacking access - Policy relevance: Voting rights, election infrastructure, ADA compliance


Health & Wellness

Option F: Recreation & Population Health - Data: Recreation Centers, Playgrounds, Parks, Census tracts (demographics) - Question: “Is lack of recreation access associated with vulnerable populations?” - Operations: Calculate recreation facilities per capita by neighborhood, buffer facilities for walking access, overlay with demographic indicators - Policy relevance: Public health investment, recreation programming, obesity prevention


Emergency Services

Option G: EMS Response Coverage - Data: Fire Stations, EMS stations, Population density, High-rise buildings - Question: “Are population-dense areas adequately covered by emergency services?” - Operations: Create service area buffers (5-minute drive = ~2 miles), assess population coverage, identify gaps in high-density areas - Policy relevance: Emergency preparedness, station siting decisions


Arts & Culture

Option H: Cultural Asset Distribution - Data: Public Art, Museums, Historic sites/markers, Neighborhoods - Question: “Do all neighborhoods have equitable access to cultural amenities?” - Operations: Count cultural assets per neighborhood, normalize by population, compare distribution across demographic groups - Policy relevance: Cultural equity, tourism, quality of life, neighborhood identity


Data Sources

OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.

Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries

Your Analysis

Your Task:

  1. Find and load additional data
    • Document your data source
    • Check and standardize the CRS
    • Provide basic summary statistics
# Load your additional dataset

library(sf)

schools <- st_read("Data/Schools/Schools.shp")
Reading layer `Schools' from data source 
  `/Users/jinhengcen/Documents/portfolio-setup-CenJinHeng/assignments/assignment_2/data/Schools/Schools.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 495 features and 14 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -8378628 ymin: 4852555 xmax: -8345686 ymax: 4884813
Projected CRS: WGS 84 / Pseudo-Mercator
crime <- st_read("Data/Crime/Crime.shp")
Reading layer `Crime' from data source 
  `/Users/jinhengcen/Documents/portfolio-setup-CenJinHeng/assignments/assignment_2/data/Crime/Crime.shp' 
  using driver `ESRI Shapefile'
replacing null geometries with empty geometries
Simple feature collection with 160388 features and 13 fields (with 6744 geometries empty)
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.50237 ymin: 39.87688 xmax: -74.95784 ymax: 42.22434
Geodetic CRS:  WGS 84
bike_network <- st_read("Data/Bike_Network/Bike_Network.shp")
Reading layer `Bike_Network' from data source 
  `/Users/jinhengcen/Documents/portfolio-setup-CenJinHeng/assignments/assignment_2/data/Bike_Network/Bike_Network.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 5225 features and 8 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -8378937 ymin: 4847835 xmax: -8345146 ymax: 4883978
Projected CRS: WGS 84 / Pseudo-Mercator
schools <- st_transform(schools, 2272)
crime <- st_transform(crime, 2272)
bike_network <- st_transform(bike_network, 2272)

library(dplyr)
library(ggplot2)

## school type
schools %>%
  st_drop_geometry() %>%
  count(type_speci, sort = TRUE) %>%
  ggplot(aes(x = reorder(type_speci, n), y = n)) +
  geom_bar(stat = "identity", fill = "lightblue") +
  coord_flip() +
  labs(
    title = "Distribution of school types",
    x = "School type",
    y = "Number"
  ) +
  theme_minimal(base_size = 13)

# Crime Hour
crime %>%
  st_drop_geometry() %>%
  count(hour) %>%
  ggplot(aes(x = hour, y = n)) +
  geom_col(fill = "firebrick") +
  labs(
    title = "Distribution of crime hour",
    x = "Hour",
    y = "Number of incidents"
  ) +
  theme_minimal(base_size = 13)

#crime types
crime %>%
  st_drop_geometry() %>%
  count(text_gener, sort = TRUE) %>%
  slice_max(n, n = 10) %>%  
  ggplot(aes(x = reorder(text_gener, n), y = n)) +
  geom_bar(stat = "identity", fill = "orange") +
  coord_flip() +
  labs(
    title = "Top 10 Crime Types",
    x = "Types",
    y = "Number of incidents"
  ) +
  theme_minimal(base_size = 13)

# Bike lane
bike_length_summary <- bike_network %>%
  st_drop_geometry() %>%
  group_by(TYPE) %>%
  summarise(total_length = sum(Shape__Len, na.rm = TRUE)) %>%
  arrange(desc(total_length))

ggplot(bike_length_summary, aes(x = reorder(TYPE, total_length), y = total_length)) +
  geom_bar(stat = "identity", fill = "forestgreen") +
  coord_flip() +
  labs(
    title = "Distribution of Bike Lane Type",
    x = "Type",
    y = "Total Length"
  ) +
  theme_minimal(base_size = 13)

Questions to answer: - What dataset did you choose and why? Three data sets are chosen: - Schools data in shp format - Crime incidents data points in shp format - Bike_Network in shp format Since I am going to calculate some metrics like the numbers of crime incidents and the length of bike lanes within school buffers, these data are needed. - What is the data source and date? They are all from OpenDataPhilly. All of them are up_to_date. The crime incidents data is from 2024, because we are in the October of the 2025, so we don’t have data for the whole 2025. Using 2024 can avoid some time variation. - How many features does it contain? All of them have geometry, which means we can run them in spatial analysis. For the school data set, we got names, types, grade levels, phone numbers, street names, and zip codes. For the crime data, we have precise time, crime types, location details. For the bike network data, we knew street names, types, lengths and oneway.
- What CRS is it in? Did you need to transform it? EPSG:2272, which is used in south Pennsylvania.


  1. Pose a research question

Write a clear research statement that your analysis will answer.

Examples: - “Do vulnerable tracts have adequate public transit access to hospitals?” - “Are EMS stations appropriately located near vulnerable populations?” - “Do areas with low vehicle access have worse hospital access?”

My question is “Are school zones safe for walking/biking, or are they crime hotspots?”


  1. Conduct spatial analysis

Use at least TWO spatial operations to answer your research question.

Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation

Your Task:

# Your spatial analysis

# I have loaded data sets and make simple statistical analysis for each of them in the previous chunk

# Create buffer for schools

schools <- schools %>% mutate(school_id = row_number())
zones <- st_buffer(schools, dist = 1000)

# Calculate the crime number

crime_by_zone <- crime %>%
  st_join(zones %>% select(school_id), join = st_within) %>%
  st_drop_geometry() %>%
  count(school_id, name = "crime_count")


# Calculate the bike lane length
bike_in_zone <- st_intersection(bike_network, zones %>% select(school_id))

bike_len_by_zone <- bike_in_zone %>%
  mutate(len_ft = as.numeric(st_length(.))) %>% 
  st_drop_geometry() %>%
  group_by(school_id) %>%
  summarise(bike_len_ft = sum(len_ft, na.rm = TRUE), .groups = "drop")

# Combine the result
zone_stats <- zones %>%
  left_join(crime_by_zone,    by = "school_id") %>%
  left_join(bike_len_by_zone, by = "school_id")

# Calculate average number of the crime incidents and bike lane length
zone_summary <- zone_stats %>%
  st_drop_geometry() %>%
  summarise(
    mean_crime = mean(crime_count, na.rm = TRUE),
    mean_bike_len = mean(bike_len_ft, na.rm = TRUE)
  )

city_summary <- tibble(
  mean_crime = nrow(crime) / nrow(zones),
  mean_bike_len = sum(st_length(bike_network)) / nrow(zones)
)

# Combine results
comparison <- rbind(
  data.frame(Category = "School Zones", 
             mean_crime = zone_summary$mean_crime, 
             mean_bike_len = zone_summary$mean_bike_len),
  data.frame(Category = "Citywide Average",
             mean_crime = city_summary$mean_crime,
             mean_bike_len = city_summary$mean_bike_len)
)

# Draw diagram
ggplot(comparison, aes(x = Category, y = mean_crime, fill = Category)) +
  geom_bar(stat = "identity", width = 0.6) +
  labs(
    title = "Comperison of Average Crime Incidents: School Zones VS City",
    y = "Averge Crime Incidents"
  ) +
  theme_minimal(base_size = 14) +
  theme(legend.position = "none")

ggplot(comparison, aes(x = Category, y = mean_bike_len, fill = Category)) +
  geom_bar(stat = "identity", width = 0.6) +
  labs(
    title = "Comperison of Bike Lane Length: School Zones VS City",
    y = "Average Bike Lane Length"
  ) +
  theme_minimal(base_size = 14) +
  theme(legend.position = "none")

# Mapping

# Get tracts of Philly
tracts_phl <- get_acs(
  geography = "tract",
  state     = "PA",
  county    = "Philadelphia",
  variables = "B01003_001",
  survey    = "acs5",
  year      = 2022,
  geometry  = TRUE,
  cache_table = TRUE
) %>%
  select(GEOID, NAME, pop = estimate, geometry) %>%
  st_transform(2272)    

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
# point_on_surface
zone_points <- zone_stats |>
  mutate(geometry = st_point_on_surface(geometry),
         bike_len_mile = bike_len_ft / 5280) |>
  st_as_sf()

# Crime point map
ggplot() +
  geom_sf(data = tracts_phl, fill = NA, color = "grey", linewidth = 0.2) +
  geom_sf(data = zone_points, aes(size = crime_count), color = "firebrick", alpha = 0.6) +
  scale_size_continuous(name = "Crime count", range = c(0, 4)) +
  labs(title = "School Zones: Crime Counts (Point size = count)", x = NULL, y = NULL) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "right")

# Bike lane point map
ggplot() +
  geom_sf(data = tracts_phl, fill = NA, color = "grey", linewidth = 0.2) +
  geom_sf(data = zone_points, aes(size = bike_len_mile), color = "forestgreen", alpha = 0.6) +
  scale_size_continuous(name = "Bike lane (miles)", range = c(0.1, 2)) +
  labs(title = "School Zones: Bike Lane Length (Point size = miles)", x = NULL, y = NULL) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "right")

Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)

Your interpretation:

School zones have fewer crime incidents and bike lane length compared to city average. More pedestrian-friendly projects should be applied to school areas. For the spatial difference, higher concentrations of crimes are observed around schools in central and southern Philadelphia. School zones with longer bike lane coverage are mostly located in the city’s core and western areas. Planning efforts should focus on improving safety in school zones with higher crime risks, especially in central and southern Philadelphia. Expanding connected and protected bike lanes around schools can promote safer and more accessible environments for students.

Finally - A few comments about your incorporation of feedback!

Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.

I need to hide my API Key!


Submission Requirements

What to submit:

  1. Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
    • Use embed-resources: true in YAML so it’s a single file
    • All code should run without errors
    • All maps and charts should display correctly

File naming: LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd